21 research outputs found

    Predicting the Next Best View for 3D Mesh Refinement

    Full text link
    3D reconstruction is a core task in many applications such as robot navigation or sites inspections. Finding the best poses to capture part of the scene is one of the most challenging topic that goes under the name of Next Best View. Recently, many volumetric methods have been proposed; they choose the Next Best View by reasoning over a 3D voxelized space and by finding which pose minimizes the uncertainty decoded into the voxels. Such methods are effective, but they do not scale well since the underlaying representation requires a huge amount of memory. In this paper we propose a novel mesh-based approach which focuses on the worst reconstructed region of the environment mesh. We define a photo-consistent index to evaluate the 3D mesh accuracy, and an energy function over the worst regions of the mesh which takes into account the mutual parallax with respect to the previous cameras, the angle of incidence of the viewing ray to the surface and the visibility of the region. We test our approach over a well known dataset and achieve state-of-the-art results.Comment: 13 pages, 5 figures, to be published in IAS-1

    Semantic Object Prediction and Spatial Sound Super-Resolution with Binaural Sounds

    Full text link
    Humans can robustly recognize and localize objects by integrating visual and auditory cues. While machines are able to do the same now with images, less work has been done with sounds. This work develops an approach for dense semantic labelling of sound-making objects, purely based on binaural sounds. We propose a novel sensor setup and record a new audio-visual dataset of street scenes with eight professional binaural microphones and a 360 degree camera. The co-existence of visual and audio cues is leveraged for supervision transfer. In particular, we employ a cross-modal distillation framework that consists of a vision `teacher' method and a sound `student' method -- the student method is trained to generate the same results as the teacher method. This way, the auditory system can be trained without using human annotations. We also propose two auxiliary tasks namely, a) a novel task on Spatial Sound Super-resolution to increase the spatial resolution of sounds, and b) dense depth prediction of the scene. We then formulate the three tasks into one end-to-end trainable multi-tasking network aiming to boost the overall performance. Experimental results on the dataset show that 1) our method achieves promising results for semantic prediction and the two auxiliary tasks; and 2) the three tasks are mutually beneficial -- training them together achieves the best performance and 3) the number and orientations of microphones are both important. The data and code will be released to facilitate the research in this new direction.Comment: Project page: https://www.trace.ethz.ch/publications/2020/sound_perception/index.htm

    The Lung Screen Uptake Trial (LSUT): protocol for a randomised controlled demonstration lung cancer screening pilot testing a targeted invitation strategy for high risk and ‘hard-to-reach’ patients

    Get PDF
    Background Participation in low-dose CT (LDCT) lung cancer screening offered in the trial context has been poor, especially among smokers from socioeconomically deprived backgrounds; a group for whom the risk-benefit ratio is improved due to their high risk of lung cancer. Attracting high risk participants is essential to the success and equity of any future screening programme. This study will investigate whether the observed low and biased uptake of screening can be improved using a targeted invitation strategy. Methods/design A randomised controlled trial design will be used to test whether targeted invitation materials are effective at improving engagement with an offer of lung cancer screening for high risk candidates. Two thousand patients aged 60–75 and recorded as a smoker within the last five years by their GP, will be identified from primary care records and individually randomised to receive either intervention invitation materials (which take a targeted, stepped and low burden approach to information provision prior to the appointment) or control invitation materials. The primary outcome is uptake of a nurse-led ‘lung health check’ hospital appointment, during which patients will be offered a spirometry test, an exhaled carbon monoxide (CO) reading, and an LDCT if eligible. Initial data on demographics (i.e. age, sex, ethnicity, deprivation score) and smoking status will be collected in primary care and analysed to explore differences between attenders and non-attenders with respect to invitation group. Those who attend the lung health check will have further data on smoking collected during their appointment (including pack-year history, nicotine dependence and confidence to quit). Secondary outcomes will include willingness to be screened, uptake of LDCT and measures of informed decision-making to ensure the latter is not compromised by either invitation strategy. Discussion If effective at improving informed uptake of screening and reducing bias in participation, this invitation strategy could be adopted by local screening pilots or a national programme. Trial registration This study was registered with the ISRCTN (International Standard Registered Clinical/soCial sTudy Number : ISRCTN21774741) on the 23rd September 2015 and the NIH ClinicalTrials.gov database (NCT0255810) on the 22nd September 2015

    A Benchmark Comparison of Monocular Visual-Inertial Odometry Algorithms for Flying Robots

    No full text
    Flying robots require a combination of accuracy and low latency in their state estimation in order to achieve stable and robust flight. However, due to the power and payload constraints of aerial platforms, state estimation algorithms must provide these qualities under the computational constraints of embedded hardware. Cameras and inertial measurement units (IMUs) satisfy these power and payload constraints, so visual- inertial odometry (VIO) algorithms are popular choices for state estimation in these scenarios, in addition to their ability to operate without external localization from motion capture or global positioning systems. It is not clear from existing results in the literature, however, which VIO algorithms perform well under the accuracy, latency, and computational constraints of a flying robot with onboard state estimation. This paper evaluates an array of publicly-available VIO pipelines (MSCKF, OKVIS, ROVIO, VINS-Mono, SVO+MSF, and SVO+GTSAM) on different hardware configurations, including several single- board computer systems that are typically found on flying robots. The evaluation considers the pose estimation accuracy, per-frame processing time, and CPU and memory load while processing the EuRoC datasets, which contain six degree of freedom (6DoF) trajectories typical of flying robots. We present our complete results as a benchmark for the research community

    AirTouch: Interacting With Computer Systems At A Distance

    No full text
    We present AirTouch, a new vision-based interaction system. AirTouch uses computer vision techniques to extend commonly used interaction metaphors, such as multitouch screens, yet removes any need to physically touch the display. The user interacts with a virtual plane that rests in between the user and the display. On this plane, hands and fingers are tracked and gestures are recognized in a manner similar to a multitouch surface. Many of the other vision and gesture-based human-computer interaction systems presented in the literature have been limited by requirements that users do not leave the frame or do not perform gestures accidentally, as well as by cost or specialized equipment. AirTouch does not suffer from these drawbacks. Instead, it is robust, easy to use, builds on a familiar interaction paradigm, and can be implemented using a single camera with off-the-shelf equipment such as a webcam-enabled laptop. In order to maintain usability and accessibility while minimizing cost, we present a set of basic AirTouch guidelines. We have developed two interfaces using these guidelines—one for general computer interaction, and one for searching an image database. We present the workings of these systems along with observational results regarding their usability. 1
    corecore